Combining multiple species distributions on one map with hexagons and points

Visualising multiple species distributions in a single figure can be difficult if there are areas where ranges overlap. In this post we introduce a method that allows users to get an overview of multiple species distributions in an area using a novel twist on the commonly used hexbin map.

Eukaryota
Animalia
Chordata
Aves
Summaries
Maps
Authors

Callum Waite

Shandiya Balasubramaniam

Published

December 18, 2023

Author

Callum Waite
Shandiya Balasubramaniam

Date

27 November 2023

Visualisations of species distributions can be very simple yet effective ways of conveying biological and ecological information, such as range, habitat, and relative population size.

Representing more than one species distribution in a single figure can be difficult, though, especially where there are areas of overlap. Points and colour-filled polygons will obscure each other even with a degree of transparency, while densities and shaded regions can only show one species at a time.

Here, we demonstrate a method to visualise distributions of multiple species with overlapping ranges on the same map, with only a small loss in resolution. The technique is a novel twist on the commonly used hexbin map: instead of using a colour fill to represent presence/absence or counts within each hexagon, we use multiple coloured points within each hexagon to represent presence/absence of species, allowing users to get a broad overview of how multiple species are distributed across an area.

This method requires a number of steps to build up the elements of the final figure:

The final figure will comprise a combination of the basemap, hex grid, and species points once these elements are created.

Let’s begin by loading the R packages we’ll be using.

library(galah)
library(ggtext)
library(ozmaps)
library(sf)
library(showtext)
library(stringr)
library(tidyverse)

We’ll use the {galah} package to download occurrence records from the Atlas of Living Australia (ALA). To do this, you’ll need to register your email address with the ALA, then pass it to {galah} using galah_config().

galah_config(email = "your-email@email.com")

Download data

Since our goal here is to map distributions of multiple species, we’ve chosen honeyeaters from the genus Melithreptus: this is a distinctive group of 7 small- to medium-sized, short-billed and square-tailed honeyeaters with overlapping distributions across Australia.

We can get taxonomic information about this group using atlas_species()

melithreptus <- galah_call() |>
  galah_identify("Melithreptus") |>
  atlas_species()

melithreptus
# A tibble: 7 × 10
  kingdom  phylum   class order         family genus species author species_guid
  <chr>    <chr>    <chr> <chr>         <chr>  <chr> <chr>   <chr>  <chr>       
1 Animalia Chordata Aves  Passeriformes Melip… Meli… Melith… (Viei… https://bio…
2 Animalia Chordata Aves  Passeriformes Melip… Meli… Melith… (Vigo… https://bio…
3 Animalia Chordata Aves  Passeriformes Melip… Meli… Melith… Gould… https://bio…
4 Animalia Chordata Aves  Passeriformes Melip… Meli… Melith… (Goul… https://bio…
5 Animalia Chordata Aves  Passeriformes Melip… Meli… Melith… (Less… https://bio…
6 Animalia Chordata Aves  Passeriformes Melip… Meli… Melith… (Goul… https://bio…
7 Animalia Chordata Aves  Passeriformes Melip… Meli… Melith… Gould… https://bio…
# ℹ 1 more variable: vernacular_name <chr>

… and then use this information to download occurrence records for the 7 species. We’ll apply the general data quality profile to filter out low quality records, pass in the list of species we’re interested in with galah_identify(), filter records to 20221 that also have spatial coordinates, and fall within one of the IBRA bioregions as a proxy for Australian records only.

species_occ <- galah_call() |>
  galah_apply_profile(ALA) |>
  galah_identify(melithreptus$species) |>
  galah_filter(year == 2022,
               !is.na(cl1048),  # IBRA bioregions
               !is.na(decimalLatitude),
               !is.na(decimalLongitude)) |>
  galah_select(decimalLatitude,
               decimalLongitude,
               species, 
               scientificName) |> 
  atlas_occurrences()

head(species_occ)
# A tibble: 6 × 4
  decimalLatitude decimalLongitude species                    scientificName    
            <dbl>            <dbl> <chr>                      <chr>             
1           -43.6             147. Melithreptus validirostris Melithreptus (Eid…
2           -43.6             147. Melithreptus validirostris Melithreptus (Eid…
3           -43.5             146. Melithreptus validirostris Melithreptus (Eid…
4           -43.5             147. Melithreptus affinis       Melithreptus (Mel…
5           -43.5             147. Melithreptus affinis       Melithreptus (Mel…
6           -43.5             147. Melithreptus validirostris Melithreptus (Eid…

Since we’re going to be performing a few spatial operations to assign species to hexagons, we’ll also convert the species_occ dataframe into a simple features object, with the latitude and longitude columns represented as points in a geometry column named occ_geometry.

species_occ_sf <- species_occ |>
  st_as_sf(coords = c("decimalLongitude", "decimalLatitude"), 
           crs = 4326) |> 
  st_set_geometry("occ_geometry")

head(species_occ_sf)
Simple feature collection with 6 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 146.1362 ymin: -43.59761 xmax: 147.1435 ymax: -43.48602
Geodetic CRS:  WGS 84
# A tibble: 6 × 3
  species                    scientificName                    occ_geometry
  <chr>                      <chr>                              <POINT [°]>
1 Melithreptus validirostris Melithreptus (Eidopsarus… (146.8627 -43.59761)
2 Melithreptus validirostris Melithreptus (Eidopsarus… (146.8627 -43.59761)
3 Melithreptus validirostris Melithreptus (Eidopsarus… (146.1362 -43.50299)
4 Melithreptus affinis       Melithreptus (Melithrept… (147.1435 -43.49038)
5 Melithreptus affinis       Melithreptus (Melithrept… (146.9424 -43.48602)
6 Melithreptus validirostris Melithreptus (Eidopsarus… (146.9424 -43.48602)

Generate hex grid

Next, we’ll set up a grid of hexagons across Australia, which we’ll use as bins for plotting summaries of species occurrence. st_make_grid() has arguments for specifying the size, type, and orientation of polygons in the grid, and the grid is created to cover the bounding box of the supplied shapefile (here the ozmap_country shapefile). We’ll transform the projection to match the coordinate reference system we set for the species occurrence records above, and assign a unique identifier to each hexagon in a column named hex_id.

hex_grid <- st_make_grid(ozmap_country,
                         cellsize = 2,
                         what = "polygons",
                         square = FALSE,
                         flat_topped = TRUE) |> 
  st_as_sf() |> 
  st_set_geometry("hex_geometry") |> 
  st_transform(4326) |> 
  rowid_to_column(var = "hex_id")

Our grid of hexagons looks like this:

Remove empty hexes

You’ve probably noticed there are a lot of redundant hexagons in the grid we just created: not every terrestrial hexagon will contain an occurrence record, and we can confidently assume hexagons in the ocean will not contain records of honeyeaters.

We’ll remove these empty hexagons with a spatial join (which behaves similarly to dplyr::left_join() for spatial objects). This returns a dataframe that has all the information from our original occurrence download, where each row is a record of a species in a particular location, but instead of the point locations associated with each record, each record has now been matched to a hexagon from the grid we just created.

hex_with_species <- st_join(x = hex_grid, 
                            y = species_occ_sf,
                            join = st_intersects,
                            left = FALSE)

head(hex_with_species, n = 10)
Simple feature collection with 10 features and 3 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 145.9652 ymin: -44.63203 xmax: 148.2746 ymax: -42.63203
Geodetic CRS:  WGS 84
     hex_id                    species                          scientificName
32       32 Melithreptus validirostris Melithreptus (Eidopsarus) validirostris
32.1     32 Melithreptus validirostris Melithreptus (Eidopsarus) validirostris
32.2     32 Melithreptus validirostris Melithreptus (Eidopsarus) validirostris
32.3     32       Melithreptus affinis     Melithreptus (Melithreptus) affinis
32.4     32       Melithreptus affinis     Melithreptus (Melithreptus) affinis
32.5     32 Melithreptus validirostris Melithreptus (Eidopsarus) validirostris
32.6     32 Melithreptus validirostris Melithreptus (Eidopsarus) validirostris
32.7     32 Melithreptus validirostris Melithreptus (Eidopsarus) validirostris
32.8     32 Melithreptus validirostris Melithreptus (Eidopsarus) validirostris
32.9     32       Melithreptus affinis     Melithreptus (Melithreptus) affinis
                       hex_geometry
32   POLYGON ((145.9652 -43.6320...
32.1 POLYGON ((145.9652 -43.6320...
32.2 POLYGON ((145.9652 -43.6320...
32.3 POLYGON ((145.9652 -43.6320...
32.4 POLYGON ((145.9652 -43.6320...
32.5 POLYGON ((145.9652 -43.6320...
32.6 POLYGON ((145.9652 -43.6320...
32.7 POLYGON ((145.9652 -43.6320...
32.8 POLYGON ((145.9652 -43.6320...
32.9 POLYGON ((145.9652 -43.6320...

This means any hexagons we initially created in the grid that don’t intersect with occurrence records have been removed:

Visualising multiple species in a hexagon

As some hexagons will contain occurrence records for more than one species, we need a way to display these overlaps. We’ll do this by setting up 7 positions in each hexagon, 1 for each species, and assign each species a position and colour so they can be visually differentiated.

The figure below summarises the process we’ll follow: for each hexagon remaining in the grid, we’ll generate a smaller hexagon, then get the coordinates of each vertex and centroid of the smaller hexagon. This gives us 7 positions to display up to 7 species in each hexagon.

Set up 7 positions

Let’s start by extracting the unique identifiers and spatial coordinates for every hexagon containing an occurrence record2. Each hex_id refers to one of the remaining hexagons in our grid. This is step 1 from the figure above.

unique_hex <- hex_with_species |> 
  count(hex_id, hex_geometry) |> 
  select(-`n`)
unique_hex
Simple feature collection with 157 features and 1 field
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 113.0562 ymin: -44.63203 xmax: 155.2028 ymax: -9.632027
Geodetic CRS:  WGS 84
First 10 features:
   hex_id                   hex_geometry
1      32 POLYGON ((145.9652 -43.6320...
2      50 POLYGON ((144.2331 -42.6320...
3      51 POLYGON ((147.6972 -42.6320...
4      70 POLYGON ((145.9652 -41.6320...
5      88 POLYGON ((144.2331 -40.6320...
6      89 POLYGON ((147.6972 -40.6320...
7     107 POLYGON ((142.5011 -39.6320...
8     108 POLYGON ((145.9652 -39.6320...
9     125 POLYGON ((140.769 -38.63203...
10    126 POLYGON ((144.2331 -38.6320...

Next, we’ll work through steps 2 - 4 from the figure above. We’ll create a smaller hexagon within each original hex using st_buffer(), extract the coordinates of its vertices using st_coordinates(), and assign an integer to each vertex ranging from 1 to 73. We’ve created an anonymous function to pipe these steps together, and used pmap() to apply this function iteratively to every hexagon in the grid.

vertex_coords <- unique_hex |> 
  mutate(vertices = pmap(
    .l = list(x = hex_geometry),
    .f = function(x) {
      x |>
        st_buffer(dist = -0.4) |>         # STEP 2: set size of smaller hex
        st_coordinates() |>               # STEP 3: get vertex coordinates of smaller hex        
        as_tibble() |>                    # convert from matrix to tibble  
        st_as_sf(coords = c("X", "Y")) |> # convert from tibble to simple features
        select(-L1, -L2) |>               # remove unnecessary columns
        mutate(vertex_position = 1:7)     # STEP 4: number vertices 
    })) |> 
  unnest(cols = vertices)

head(vertex_coords, n = 10)
Simple feature collection with 10 features and 2 fields
Active geometry column: hex_geometry
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 144.2331 ymin: -44.63203 xmax: 148.2746 ymax: -41.63203
Geodetic CRS:  WGS 84
# A tibble: 10 × 4
   hex_id                 hex_geometry             geometry vertex_position
    <int>                <POLYGON [°]>              <POINT>           <int>
 1     32 ((145.9652 -43.63203, 146.5… (146.4271 -43.63203)               1
 2     32 ((145.9652 -43.63203, 146.5… (146.7735 -43.03203)               2
 3     32 ((145.9652 -43.63203, 146.5… (147.4663 -43.03203)               3
 4     32 ((145.9652 -43.63203, 146.5… (147.8127 -43.63203)               4
 5     32 ((145.9652 -43.63203, 146.5… (147.4663 -44.23203)               5
 6     32 ((145.9652 -43.63203, 146.5… (146.7735 -44.23203)               6
 7     32 ((145.9652 -43.63203, 146.5… (146.4271 -43.63203)               7
 8     50 ((144.2331 -42.63203, 144.8…  (144.695 -42.63203)               1
 9     50 ((144.2331 -42.63203, 144.8… (145.0414 -42.03203)               2
10     50 ((144.2331 -42.63203, 144.8… (145.7342 -42.03203)               3

In the resulting dataframe, vertex_coords, the hex_id and hex_geometry columns refer to the unique identifier and geometry of the original large hexagons from the grid. Each of these is duplicated across 7 rows, and the geometry and vertex_position columns refer to the smaller hexagons we created within each large hexagon. geometry contains the spatial coordinates for each inner vertex position, and vertex_position identifies each point according to the figure above, starting from the left and moving clockwise.

It’s worth noting that we define the size of the smaller hexagon with the dist argument in st_buffer(), which changes based on the cellsize of the original hexagon we created (in the six-hexagon figure above, we set cellsize = 2). Depending on the number of species you’d like to fit within each polygon and the shape of the polygon you’ve chosen, you may need to try out different values of cellsize and dist to find combinations that work best for your visualisation.

We require seven positions in each hexagon (to fit a maximum of seven species) and conveniently have seven rows for each hexagon in vertex_coords (because vertex_position 1 and 7 represent the same vertex of the closed hexagon shape). We can therefore mutate the duplicated row of the 7th vertex to hold the coordinates of the centroid of each hexagon instead, giving us seven distinct positions (step 5 in the illustrated process above).

vertex_centroid_coords <- vertex_coords |> 
  mutate(geometry = ifelse(vertex_position == 7,      
                           st_centroid(hex_geometry), 
                           geometry)) |> 
  st_drop_geometry() |>                               
  st_as_sf(crs = 4326)

head(vertex_centroid_coords, n = 10)
Simple feature collection with 10 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 144.695 ymin: -44.23203 xmax: 147.8127 ymax: -42.03203
Geodetic CRS:  WGS 84
# A tibble: 10 × 3
   hex_id             geometry vertex_position
    <int>          <POINT [°]>           <int>
 1     32 (146.4271 -43.63203)               1
 2     32 (146.7735 -43.03203)               2
 3     32 (147.4663 -43.03203)               3
 4     32 (147.8127 -43.63203)               4
 5     32 (147.4663 -44.23203)               5
 6     32 (146.7735 -44.23203)               6
 7     32 (147.1199 -43.63027)               7
 8     50  (144.695 -42.63203)               1
 9     50 (145.0414 -42.03203)               2
10     50 (145.7342 -42.03203)               3

Assign species to positions

The melithreptus dataframe (created earlier using atlas_species()) requires a small amount of tidying to be compatible with the rest of our data. The species column contains subgenera, which we can remove with regular expressions (regex)4. We also need to ensure all species have a vernacular name, noting that Melithreptus chloropsis is currently lacking one in the ALA database. We can then assign a number (1-7) denoting each species’ position in a hexagon5.

species_data <- melithreptus |>
  select(species, vernacular_name) |>
  mutate(species = str_replace_all(species, "\\(.*?\\)\\s*", ""),
         vernacular_name = if_else(species == "Melithreptus chloropsis", 
                                   "Gilbert's Honeyeater",
                                   vernacular_name),
         vertex_position = c(1:7))

species_data
# A tibble: 7 × 3
  species                    vernacular_name           vertex_position
  <chr>                      <chr>                               <int>
1 Melithreptus lunatus       White-naped Honeyeater                  1
2 Melithreptus brevirostris  Brown-headed Honeyeater                 2
3 Melithreptus albogularis   White-throated Honeyeater               3
4 Melithreptus gularis       Black-chinned Honeyeater                4
5 Melithreptus affinis       Black-headed Honeyeater                 5
6 Melithreptus validirostris Strong-billed Honeyeater                6
7 Melithreptus chloropsis    Gilbert's Honeyeater                    7

Our final step is to bring these three dataframes (hex_with_species, species_data, vertex_centroid_coords) together with dplyr::left_join(). We can perform these joins with some key identifying columns, namely hex_id, species and vertex_position. We begin with our distinct hexagon and species combinations and join the species positions and common names using the species field…

species_points_a <- hex_with_species |>
  st_drop_geometry() |>
  select(hex_id, species) |>
  distinct() |> 
  left_join(species_data,
            by = join_by(species))

head(species_points_a, n = 10)
   hex_id                    species          vernacular_name vertex_position
1      32 Melithreptus validirostris Strong-billed Honeyeater               6
2      32       Melithreptus affinis  Black-headed Honeyeater               5
3      50 Melithreptus validirostris Strong-billed Honeyeater               6
4      50       Melithreptus affinis  Black-headed Honeyeater               5
5      51 Melithreptus validirostris Strong-billed Honeyeater               6
6      51       Melithreptus affinis  Black-headed Honeyeater               5
7      70       Melithreptus affinis  Black-headed Honeyeater               5
8      70 Melithreptus validirostris Strong-billed Honeyeater               6
9      88 Melithreptus validirostris Strong-billed Honeyeater               6
10     88       Melithreptus affinis  Black-headed Honeyeater               5

…and follow this with another join to bring in the coordinates of each species point inside the hexagons. These are uniquely identified by their hex_id and vertex_position, and we can assign the species to each vertex using the vertex_position column we also created in species_data.

species_points <- species_points_a |>
  left_join(vertex_centroid_coords,
            by = join_by(vertex_position, hex_id)) |> 
  st_as_sf(crs = 4326)

head(species_points, n = 10)
Simple feature collection with 10 features and 4 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 145.0414 ymin: -44.23203 xmax: 149.1983 ymax: -41.23203
Geodetic CRS:  WGS 84
   hex_id                    species          vernacular_name vertex_position
1      32 Melithreptus validirostris Strong-billed Honeyeater               6
2      32       Melithreptus affinis  Black-headed Honeyeater               5
3      50 Melithreptus validirostris Strong-billed Honeyeater               6
4      50       Melithreptus affinis  Black-headed Honeyeater               5
5      51 Melithreptus validirostris Strong-billed Honeyeater               6
6      51       Melithreptus affinis  Black-headed Honeyeater               5
7      70       Melithreptus affinis  Black-headed Honeyeater               5
8      70 Melithreptus validirostris Strong-billed Honeyeater               6
9      88 Melithreptus validirostris Strong-billed Honeyeater               6
10     88       Melithreptus affinis  Black-headed Honeyeater               5
                     geometry
1  POINT (146.7735 -44.23203)
2  POINT (147.4663 -44.23203)
3  POINT (145.0414 -43.23203)
4  POINT (145.7342 -43.23203)
5  POINT (148.5055 -43.23203)
6  POINT (149.1983 -43.23203)
7  POINT (147.4663 -42.23203)
8  POINT (146.7735 -42.23203)
9  POINT (145.0414 -41.23203)
10 POINT (145.7342 -41.23203)

Map

Let’s check how our three spatial layers—basemap, hexagons, and species points—look on a map.

ggplot() +
  geom_sf(data = ozmap_states, fill = NA) +
  geom_sf(data = unique_hex, fill = NA) +
  geom_sf(data = species_points, aes(colour = vernacular_name)) +
  lims(x = c(112, 155), y = c(-46, -8)) +
  theme_void()

Since it’s all looking correct, we can add some final flourishes to make our map more aesthetically pleasing, as well as more accessible with a colourblind friendly palette by Paul Tol.

Code
font_add_google("Montserrat")
showtext_auto(enable = TRUE)

tol_muted <- c("#88CCEE", "#CC6677", "#DDCC77", "#117733", "#332288", "#AA4499", "#44AA99")

ggplot() +
  geom_sf(data = ozmap_states, 
          fill = NA, colour = "#ababab", linewidth = 0.3) +
  geom_sf(data = unique_hex, 
          fill = "#efefef55", colour = "#777777", linewidth = 0.5) +
  geom_sf(data = species_points, aes(colour = vernacular_name), 
          size = 2.3) +
  scale_colour_manual(
    values = tol_muted,
    guide = guide_legend(title = "*Melithreptus* &#0020; species",
                         override.aes = list(size = 4))
  ) +
  lims(x = c(112, 155), y = c(-46, -8)) +
  theme_void() +
  theme(legend.title = element_markdown(family = "Montserrat", size = 24),
        legend.text = element_text(family = "Montserrat", size = 20),
        legend.spacing.x = unit(0, "in"))

Final Thoughts

This visualisation is a novel way to show range overlaps and distributions of multiple species at once. A key strength is the consistency of the repeatable hex unit - the fixed positions and colours of the species points within hexagons make it easier to follow patterns within or between species.

This is also a very flexible method: it is easily customisable with regards to 1) the size, shape (hexagons vs squares) and orientation of the polygons; 2) the colours and orientations of points within the hexagons; and 3) the spatial scale of the base map.

Consider also that you do not necessarily need to use exactly seven different species/taxa - with a bit of creativity, it is possible to fit any number of points from 2-9 into a hexagon (2-7) or square symmetrically…

Expand for session info

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31)
 os       macOS Sonoma 14.2.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Australia/Melbourne
 date     2024-01-24
 pandoc   3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 dplyr       * 1.1.4   2023-11-17 [1] CRAN (R 4.3.1)
 forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
 galah       * 2.0.0   2023-11-20 [1] CRAN (R 4.3.1)
 ggplot2     * 3.4.4   2023-10-12 [1] CRAN (R 4.3.1)
 ggtext      * 0.1.2   2022-09-16 [1] CRAN (R 4.3.0)
 htmltools   * 0.5.7   2023-11-03 [1] CRAN (R 4.3.1)
 lubridate   * 1.9.3   2023-09-27 [1] CRAN (R 4.3.1)
 ozmaps      * 0.4.5   2021-08-03 [1] CRAN (R 4.3.0)
 patchwork   * 1.1.3   2023-08-14 [1] CRAN (R 4.3.0)
 purrr       * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
 readr       * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
 sessioninfo * 1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
 sf          * 1.0-14  2023-07-11 [1] CRAN (R 4.3.0)
 showtext    * 0.9-6   2023-05-03 [1] CRAN (R 4.3.0)
 showtextdb  * 3.0     2020-06-04 [1] CRAN (R 4.3.0)
 stringr     * 1.5.1   2023-11-14 [1] CRAN (R 4.3.1)
 sysfonts    * 0.8.8   2022-03-13 [1] CRAN (R 4.3.0)
 tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
 tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
 tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)

 [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library

──────────────────────────────────────────────────────────────────────────────

Footnotes

  1. There are over half a million records from this genus in the ALA, so restricting our download to records from 2022 significantly speeds things up!↩︎

  2. Using distinct() produces an identical result to count() here, but is far slower because checking for distinct values in the geometry column is computationally intensive. If your dataframe has fewer rows, you could also do this: hex_with_species |> select(hex_id, hex_geometry) |> distinct().↩︎

  3. Each hexagon is formed as a closed (rather than open) polygon, whereby the vertices are joined in the following order: 1-2-3-4-5-6-1. So although there are only 6 vertices, we get 7 sets of coordinates, with the first and seventh sets being duplicated to close the polygon.↩︎

  4. Regular expressions, or regex, are used to match specific patterns in strings. Here, we want to remove the inclusion of subgenera, parentheses, and any extra spaces in species names (e.g. "Melithreptus (Melithreptus) affinis" to "Melithreptus affinis"), and we do this using species = str_replace_all(species, "\\(.*?\\)\\s*", ""). We’re looking for a sequence that starts with an opening parenthesis (\\(), is followed by any characters (.*?), and ends with a closing parenthesis (\\)). Any spaces following the closing parenthesis (\\s*) are also matched. Such sequences are replaced with an empty string (""), effectively removing them.↩︎

  5. Here we assign the positions simply with vertex_position = c(1:7), however you can reorder the dataframe or this position vector to have more control over which point in the hexagon each species is assigned. For instance, you might wish to do this to separate similar colours within the hexagon, or to assign the most widely distributed species to the centre point.↩︎